- Title
- CyTex: Transforming speech to textured images for speech emotion recognition
- Creator
- Bakhshi, Ali; Harimi, Ali; Chalup, Stephan
- Relation
- Speech Communication Vol. 139, Issue April 2022, p. 62-75
- Publisher Link
- http://dx.doi.org/10.1016/j.specom.2022.02.007
- Publisher
- Elsevier
- Resource Type
- journal article
- Date
- 2022
- Description
- Speech emotion recognition is an important aspect of emotional state recognition in human–machine interaction. Approaches using speech-to-image transforms have become popular in recent years because they can utilise deep neural network models that have proven to be successful in the image processing domain. In this paper, we propose a new speech-to-image transform, CyTex, that maps the raw speech signal directly to a textured image by using calculations based on the fundamental frequency of each speech frame. The textured RGB images resulting from the CyTex transform can then be classified using standard deep neural network models for the recognition of different classes of emotion. Using this approach, we can report an improvement of classification accuracies over the previous state-of-the-art results by 0.81% for the Emo-DB database, and also by 0.5% for the IEMOCAP database.
- Subject
- speech emotion recognition; speech to textured-image transform; deep neural network; human-machine interaction
- Identifier
- http://hdl.handle.net/1959.13/1485681
- Identifier
- uon:51669
- Identifier
- ISSN:0167-6393
- Language
- eng
- Reviewed
- Hits: 403
- Visitors: 402
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|